Two-ways Adaptive Failure Detection with the φ-Failure Detector

نویسندگان

  • Naohiro Hayashibara
  • Xavier Défago
  • Takuya Katayama
چکیده

It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. Such a service can however prove useful only if it can adapt simultaneously to changing network conditions and conflicting application requirements. This paper presents a novel approach to adaptive failure detectors, called φ-failure detectors, which dynamically adapts to application requirements, as well as network conditions. The key idea is as follows. Traditionally, failure detectors maintain a set of suspected processes. The information is hence of boolean nature, that is, some process p is suspected if and only if it belongs to this set. In contrast, a φ-failure detector associates a value φp to every known process p. The value φp increases according to a normalized scale which represents the degree of confidence that process p has crashed. The scale is dynamically adapted from the current network conditions, and each application can trigger suspicions according to a threshold which corresponds to its own requirements. We describe a possible implementation for such a service, although some specific questions remain open where this work is still in progress.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation and Performance Analysis of the φ-Failure Detector

Failure detection is a fundamental building block for ensuring fault tolerance in distributed systems. However, providing accurate and flexible failure detection in off-the-shelf distributed systems is difficult. Practical solutions to failure detection rely on some adaptive mechanism to cope with the unpredictability of networking conditions. However, while they provide reasonably good accurac...

متن کامل

The Φ Accrual Failure Detector

Detecting failures is a fundamental issue for fault-tolerance in distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup or time synchronization. However, this has not been successful so far. One of the reasons is the difficulty to satisfy several application requirements simultaneo...

متن کامل

Self-healing in payment switches with a focus on failure detection using State Ma- chine-based approaches

Composition, change and complexity have attracted ev- eryone’s attention towards Self-Adaptive systems. These systems, inspired by the human body, are capable of adapting to changes in the inner and outer environment. The main objective of this study is to achieve a more con- venient availability for e-banking services in the payment switch, using self-healing systems and focusing on the failur...

متن کامل

Self-healing in payment switches with a focus on failure detection using State Ma- chine-based approaches

Composition, change and complexity have attracted ev- eryone’s attention towards Self-Adaptive systems. These systems, inspired by the human body, are capable of adapting to changes in the inner and outer environment. The main objective of this study is to achieve a more con- venient availability for e-banking services in the payment switch, using self-healing systems and focusing on the failur...

متن کامل

On the Design of a Failure Detection Service for Large-Scale Distributed Systems

It is widely recognized that distributed systems would greatly benefit from the availability of a generic failure detection service. There are however several issues that must be addressed before such a service can actually be implemented. In this paper, we highlight the main issues related to ensuring failure detection in large-scale systems, and overview the main solutions proposed in the lit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003